PageRank Computation Using PC Cluster
نویسندگان
چکیده
Link based analysis of web graphs has been extensively explored in many research projects. PageRank computation is one widely known approach which forms the basis of the Google search. PageRank assigns a global importance score to a web page based on the importance of other web pages pointing to it. PageRank is an iterative algorithm applying on a massively connected graph corresponding to several hundred millions of nodes and hyper-links. In this paper, we propose an efficient implementation of PageRank computation for a large sub-graph of the web on a PC cluster. A link structure file representing the web graph of several hundred million links, and an efficient PageRank algorithm capable of computing PageRank scores very fast, will be discussed. Experimental results on a small cluster of x86 based PC with artificial 776 million links of 87 million nodes derived from the TH domain report 30.77 seconds per iteration run.
منابع مشابه
Hypergraph Partitioning for Faster Parallel PageRank Computation
The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a user-centred model of web-surfing behaviour. As the web has expanded and as demand for user-tailored web page ordering metrics has grown, scalable parallel computation ...
متن کاملPrivacy Preserving PageRank Algorithm By Using Secure Multi-Party Computation
In this work, we study the problem of privacy preserving computation on PageRank algorithm. The idea is to enforce the secure multi party computation of the algorithm iteratively using homomorphic encryption based on Paillier scheme. In the proposed PageRank computation, a user encrypt its own graph data using asymmetric encryption method, sends the data set into different parties in a privacy-...
متن کاملSentence Clustering using PageRank Topic Model
The clusters of review sentences on the viewpoints from the products’ evaluation can be applied to various use. The topic models, for example Unigram Mixture (UM), can be used for this task. However, there are two problems. One problem is that topic models depend on the randomly-initialized parameters and computation results are not consistent. The other is that the number of topics has to be s...
متن کاملPiccolo: Building Fast, Distributed Programs with Partitioned Tables
Many applications can see massive speedups by distributing their computation across multiple machines. However, as the number of machines increases, so does the difficulty of writing efficient programs users must tackle the problem of minimizing communication and synchronization performed between hosts while also taking care to be robust against machine failures. This paper presents Piccolo, a ...
متن کاملIndexing the Web - A Challenge for Supercomputers
Since January 2002, the Google search engine has been powering an average of 150 million web searches a day, with a peark of over 2000 searches per second. These searches are performed over an index of over 2 billion documents, over 300 million images, and over 700 million Usenet messages. To guarantee fast user response time, Google performs these searches on a cluster of over 10,000 PCs. The ...
متن کامل